Polypeptide, and a dna sequence to which a protein containing the polypetide binds

ABSTRACT

The present invention provides a polypeptide existing in common to proteins expressed from the glial cells missing (gcm) gene which is a fate determination gene of cells in animal central nervous system, which polypeptide has the amino acid sequence of Sequence ID No. 1 or a part thereof and a DNA sequence to which the proteins containing the polypeptide binds. These polypeptide and DNA sequence would contribute to the progress of clarification of the forming mechanism of the human nervous systems, and at the same time, opens up a new way to development of diagnostic and medical treatment methods of cerebral functional diseases and cerebral tumor.

FIELD OF THE INVENTION

[0001] The present invention relates to a polypeptide existing in common to proteins expressed from the glial cells missing (gcm) gene which is a fate determination gene of cells in animal central nervous system, and DNA sequence on animal gemonic DNA to which the polypeptide binds. More particularly, the present invention relates to a polypeptide existing in novel DNA-binding protein and a specific DNA sequence on the genomic DNA to which this protein binds, which are useful as materials for research purposes for searching for the mechanism of central nervous system formation in various animal species including human, and useful for technical development of diagnosis and medical treatment means of various diseases caused by structural and functional disorders of the central nervous system.

BACKGROUND ART OF THE INVENTION

[0002] The central nervous system can be deemed as a giant information processor comprising a huge number of cells, and component cells may be broadly divided into neuron and glial cells. The neurons are considered to play a central role in information processing by forming a network, and glial cells provide support to formation and functioning of the network and plays an important role in repairing damages to the network. Accurate formation of these two types of cells playing different roles as described above is an essential prerequisite for the central nervous system to function normally. It is therefore very important, in clarifying the forming mechanism of the complicated nervous system and searching for functions thereof, to have knowledge of formation of diversity and determination of differentiation of generation of nervous system.

[0003] The fact that neurons and glial cells are produced from common precursor cells (stem cells) has so far been reported for many species. However, it has not as yet been clarified by what mechanism individual stem cells are differentiated into neurons and glial cells. While it is known that some molecules exert an effect on the determination of differentiation, it has not been shown that this effect is necessary for determination of glia vs. neuronal cell fate.

[0004] Drosophila is a model organism having a long history as an experimental material in genetics. The most important advantage of using Drosophila is the possibility of rapidly identifying an unknown gene participating in an interesting biological process through screening of mutants. It is now possible to determine individual cells of the central nervous system and know the generating process thereof in detail as a result of preparation of cell markers using various antibodies or lacZ gene and establishment of dye injecting techniques into a single cell. The central nervous system of Drosophila is accordingly becoming an excellent model for research efforts on generation of cells forming the nervous system and differentiation determination mechanism.

[0005] The central nervous system of Drosophila is composed of 30 neuroblasts per side of each segment. Neuroblasts produce both neurons and glial cells by repeating a stem cell-like asymmetric divisions, and about 300 neurons and about 30 glial cells are created per side of each segment before completion of a nervous system. Each of the 30 neuroblasts is formed at a predetermined timing at a predetermined position to permit individual identification, so that a name is assigned to each of them. Further followup of cells produced by specific neuroblasts permits observation of generation of specific neurons and glial cells. In Drosophila, therefore, fate of each cell of the nervous system is almost completely determined genetically.

[0006] The present inventors carried out screening of mutants of Drosophila for the purpose of identifying genes participating in differentiation of the nervous system, and as a result, the present inventors obtained a mutant strain having an abnormality in formation of glial cells. The resultant mutant was therefore named glial cells missing (gcm), and the gcm gene was specified as a causative gene of this mutant (Cell 82: 1025-1036, 1995). This mutation of the gcm gene, which confirms the fact that cells to become glia are differentiated into neurons, or that misexpressing gcm gene in neuroblasts causes presumptive neurons to differentiate into glia, suggests that this gcm gene controls determination of fate between neurons and glial cells.

[0007] Apart from the present inventors, three groups carried out analysis independently as to this gcm gene, and obtained results supporting the conclusion as described above (Cell 82: 1013-1023, 1995; Genetics 139: 1663-1678, 1995; Development 122: 131-139, 1996; Cell 83: 671-674, 1995; Neuron 15: 1219-1222, 1995).

[0008] As described above, it is now clear that the gcm gene plays an important role in the determination of fate of nervous system cells. However, amino acid sequence deduced from the nucleotide sequence of cDNA thereof exhibits almost no homology with any of the proteins in the database, so that functions of the protein (GCM) expressed from the gcm gene have been unknown.

[0009] In order to accurately understand functions of a particular gene, in general, it is essential to clarify the physiological activity of the expression product. It is therefore very important to specify the role of GCM in the central nervous system, with a view to elucidating the formation mechanism of the central nervous system as well as to develop diagnosis and medical treatment techniques of central nervous system diseases.

[0010] The present invention was made in view of the circumstances as described above, and has an object to provide a polypeptide and DNA sequence which permit more diverse and various industrial uses of the gcm gene and the expressed protein thereof, by clarifying functions of the expressed protein GCM of the gcm gene of nervous system cells.

SUMMARY OF THE INVENTION

[0011] The present invention provides a polypeptide existing in common to proteins expressed from the gcm gene which is a fate determination gene of cells of animal central nervous system, which polypeptide has the amino acid sequence of sequence ID No. 1 or a part thereof.

[0012] The invention provides also a polypeptide, of which one or more amino acid residues in the Sequence ID No. 1 or a part thereof are substituted, deleted or added.

[0013] Further, the invention provides a DNA sequence of animal genomic DNA to which a protein containing the foregoing polypeptide binds, which DNA sequence has the nucleotide sequence of Sequence ID No. 2 or a homolog sequence thereto, and a vector holding such DNA sequence.

[0014] In the amino acid sequence of Sequence ID No 1, Xaa represents an arbitrary amino acid residue, and in the nucleotide sequence of Sequence ID No. 2, R represent A or G.

[0015] The polypeptide provided by the present invention and the DNA sequence to which a protein containing this polypeptide binds are useful as materials for research purposes of searching for the mechanism of the central nervous system formation in various animal species and functions thereof. These materials are applicable also for diagnosis and medical treatment of various diseases caused by structural or functional disturbances of the central nervous system (for example, glial cerebral tumor). In the future, it is expected to be possible to grow and culture blast glial cells, separated from a neonatal, in vitro, and when he or she, becoming adult, suffers from a cerebral functional disorder, to control the gcm genes in the stored cells by means of the foregoing protein or polypeptide, convert blast glial cells into neurons, return the thus converted neurons into the brain to recover functions: the “self-brain cell transplantation technique”.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016]FIG. 1A is a constitutional diagram of GCM fusion protein used in the gel shift assay; and

[0017]FIG. 1B is an electrophoresis showing the result of the gel shift assay;

[0018]FIG. 2A illustrates the matching ratio of the GCM binding sequence; and FIG. 2B is an electrophoresis showing the result of an antagonistic assay;

[0019]FIG. 3A illustrates GCM binding sites in the upstream region of repo gene; and

[0020]FIG. 3B is a DNA sequence in these sites;

[0021]FIG. 4 illustrates GCM amino acid sequences of human, mouse and Drosophila, respectively; and

[0022]FIG. 5 illustrates partial amino acid sequences of GCM from mouse and Drosophila.

DETAILED DESCRIPTION OF THE INVENTION

[0023] The polypeptide of the invention can be isolated through hydrolysis or the like of GCM protein expressed in the central nervous system of an animal. It is also possible to prepare the polypeptide of the invention through chemical synthesis on the basis of the amino acid sequence provided by the invention. The polypeptide of the invention includes a peptide fragment containing any partial amino acid sequence of Sequence ID No. 1 (five or more amino acid residues).

[0024] The DNA sequence of the invention is a DNA sequence existing in common to genomic DNA sequences to which GCM protein binds, and can be prepared by synthesizing with a knows method as a DNA fragment. Such a DNA fragment is applicable as a probe for specifying a gene to which GCM protein binds, or a primer for PCR-amplifying such a gene.

[0025] Now, the polypeptide and the DNA sequence of the invention will be described in detail on the basis of an experimental study carried out by the present inventors.

[0026] (1) DNA binding properties of GCM

[0027] The gcm gene of Drosophila encodes a 504 amino acid protein. It is clear, after searching for amino acid sequences in the database, that this protein has no motif characterized to date, other than a nuclear localization signal. This protein GCM was consequently confirmed to be quite a novel nuclear protein. Because this protein brings about expression and other changes of many genes, it was most simply assumed that GCM directly binds to genomic DNA and serves as a transcription regulating factor controlling the expression of its target genes. With a view to studying this possibility, the DNA binding properties of GCM were investigated.

[0028] The upstream region of the repo gene of Drosophila was selected as an target of GCM binding. The homeobox gene repo is expressed in virtually all glial cells from an early stage onwards, and this expression is strongly dependent on gcm. Therefore, when assuming that GCM is a transcription regulating factor, repo gene is considered suitable as candidates for target genes.

[0029] a) Procedures

[0030] Fusion proteins of various regions of GCM and maltose-binding protein were prepared as an expression products of Escherichia coli, and binding or not with various DNA fragments prepared from the genomic upstream region of repo gene was investigated by the gel shift assay.

[0031] Production of fusion proteins: Fusion proteins were prepared using a protein and purification system (made by New England Biolabs). Various segments of the gcm gene were amplified by the PCR method with pfu polymerase (TOYOBO), and inserted in a pMAL™-cz vector, which was then introduced into Escherichia coli BL21(DE3)pLysS. Fusion proteins and MBP-lacZp, which was used as a control for gel shift assay, were induced in the transformed Escherichia coli, and then sonicated, extracted and affinity-purified using amylose resin. Structure of each fusion protein is as shown in FIG. 1A: 1 is the control; 2-5 are fusion proteins; and 6 is a schematic view of a normal GCM. Figures attached to each constitutional diagram represent positions of the amino acid residues of GCM.

[0032] Preparation of DNA fragments in repo gene upstream region: Clones containing the repo gone were isolated from a Drosophila genomic library. After confirming the nucleotide sequence thereof, the repo gene upstream region of about 7 kb was excised with multiple restriction enzymes to prepare 60-460 bp DNA fragments.

[0033] Gel shift assay: Each fusion protein in an amount of about 200 ng was incubated in accordance with a known method (Cell 64: 439-446, 1991; EMBO J. 10:2965-2973, 1991), with 100 bp of ³²P-labeled DNA at 25° C. for 30 minutes, and then subjected to electrophoresis at 4° C. with polyacrylamide.

[0034] b) Results

[0035] The results of the gel shift assay are shown, in the electrophoresis of FIG. 1B. The lane numbers correspond to proteins of which the structures are shown in FIG. 1A. The fusion proteins 2 (N243) and 4 (N181) bound to the DNA fragments in the repo gene upstream region (arrow B in the drawing). This permits confirmation that GCM is a DNA-binding protein and at the same time, suggests that DNA-binding activity is present in the region of up to the 181st amino acid at the amino terminal of GCM. Since the amino acid sequence of up to the 181st one does not exhibit homology with any known proteins, it was found to be a novel DNA-binding domain.

[0036] (2) DNA sequence to which GCM binds

[0037] It was investigated whether or not the DNA-binding domain of GCM binds by recognizing a specific sequence.

[0038] a) Procedures

[0039] Gel shift assay: In accordance with a known method (Science 250: 1104-1110, 1990; Cell 64: 459-470, 1991; Science 257: 1951-1955, 1992; Cell 68: 283-302, 1992), probes were prepared from oligonucleotide resulting from insertion of 15 bp of random sequence, respectively, into nucleotide sequences of Sequence ID Nos. 3 and 4, and gel shift assay was carried out in the same manner as above by causing reaction with fusion protein N243. Oligonucleotide binding with fusion protein was isolated, and PCR-amplified with the sequences of Sequence ID Nos. 3 and 4 as primers, and the resultant PCR products and fusion protein were examined with gel shift assay. This cycle was repeated three times, and the sequences of the finally obtained PCR products were determined.

[0040] Competition assay: Competitive oligonucleotide exhibiting the nucleotide sequences of Sequence ID Nos. 5 and 6 were previously caused to react with fusion protein N243, and then with N243 with a labeled 200 bp DNA fragment (FIGS. 3A and 3B) to carry out a gel shift assay in the same manner as above.

[0041] b) Results

[0042] Forty-eight clones of oligonucleotide were obtained through the gel shift assay of oligonucleotide and fusion protein and repetition three times of PCR-amplification of oligonucleotide bound to protein. Among them, 39 clones had common nucleotide sequences shown by Sequence ID No. 2. Eleven clones had sequences that contained only one base mismatch, and the remaining three clones had only two mismatches of base. The sequence alignment of each base is as shown in FIG. 2A, which shows a high alignment within a range of from 87 to 100%. This result strongly suggests that GCM protein specifically recognizes the nucleotide sequence of Sequence ID No. 2 upon binding to DNA.

[0043] The Sequence ID No. 2 was confirmed to be a GCM-binding sequence also from the results of the competition assay (FIG. 2B). More specifically, when previously causing a reaction with the competitive oligonucleotide of Sequence ID No. 5 including the sequence of Sequence ID No. 2 (FIG. 2B-a), binding property with the probe disappears in dependence upon the concentration thereof (1, 10, 100 and 1,000 times), whereas in a case with Sequence ID No. 6 not including a sequence of Sequence ID No. 2, no competitive action against the probe was observed.

[0044] (3) GCM binding sites in the upstream region of the repo gene

[0045] On the assumption that the repo is a target gene of GCM, the region in the upstream of the repo gene for GCM binding sites was searched for.

[0046] Firstly, gel shift assays were carried out by the use of 21 non-overlapping DNA fragments (FIG. 3A: bottom horizontal bar) and the fusion protein N243 to specify the DNA fragments to which the protein binds.

[0047] As a result, N243 bound to eight DNA fragments (FIG. 3A: bottom thick horizontal bar). On the other hand, C261 did not bind to any of the DNA fragments.

[0048] Then, the DNA sequence of the 7 kb upstream region was investigated, and it was found that eleven GCM-binding sequences (Sequence ID No. 2) existed within 4 kb upstream region, while no binding sequence was present in the −4 to −7 kb upstream region. The sequence of these 11 sites are as shown in FIG. 3B: those having seven sequences in match out of eight bases were counted as GCM-binding sites. All the eight foregoing DNA fragments binding to N243 contained one or more GCM-binding sequences.

[0049] The fact that as many as eleven sites of GCM-binding sequence clusters are present in the upstream region of the repo gene suggests that GCM directly controls expression of the repo gene as a transcription regulating factor.

[0050] (4) GCM common sequence in different species of animal

[0051] In order to find whether the DNA binding domain is conserved throughout evolution, mammalian homologues were compared with Drosophila GCM.

[0052] A human gene (hGCMa) was derived from the EST database. Because the sequences in this database were not complete, a complete code sequence list was prepared by the 5′-RACE (rapid amplification of cDNA ends) and 3′-RACE methods. Mouse genes (nGCMa and mGCMb) were isolated by using the conserved region between GCM and hGCMa. More specifically, mGCMa was prepared from mouse placenta poly(A)+RNA by the reverse transcriptase PCR(RT-PCR) method, and mGCMb from mouse brain poly(A)+RNA by the RT-PCR method.

[0053] Comparison of the amino acid sequence of the protein deduced from these human and mouse genes and the amino acid sequence of GCM revealed strong conservation of the highly basic amino-terminal one-third as shown in FIG. 4. An evolutionally conserved motif could be unambiguously defined from these comparisons, the defined motif being named the “gcm-motif”. Further, this motif corresponds also to the DNA-binding domain of GCM (1-181 amino acid residues). Comparison of the individual sequences reveals the presence of three absolutely conserved stretches of nine to ten amino acid residues (A, B and C in FIG. 4), and seven conserved cysteine and four conserved histidine residues In contrast to the highly conserved gcm-motif at the amino-terminal regions, carboxy-terminal regions mostly have no similarity to each other nor to any known proteins.

[0054] Gel shift assays and competition assays were carried out in the same manner as above by using proteins obtained by causing expression of DNA fragments of up to the 171-st amino acid residue of hGCMa, giving results similar to those of GCM. These results also suggest that the gcm-motif is a sequence containing a specific DNA-binding domain.

[0055] Further, presence of gcm-motif was confirmed also in the RT-PCR product (mGCMa2) resulting from the mouse brain poly (A)+RNA as a template and the PCR product (dGCM2) resulting from the Drosophila genomic DNA as template (FIG. 5) suggests that the gcm-motif is conserved commonly to many animal species, and the proteins containing this sequence form a novel family of DNA-binding proteins.

[0056] According to the present invention, as described above in detail, the glial cells missing (gcm) gene of the central nervous system of various animals commonly exist in expressed GCM proteins, and there are provided a polypeptide having a specific amino acid sequence functioning as a DNA-binding domain of this GCM protein, and a DNA sequence to which this GCM protein binds. This contributes to the progress of clarification of the molecular mechanism of forming the human nervous systems, and at the same time, opens up a new way to development of diagnostic and medical treatment methods of cerebral functional diseases and cerebral tumor using the gcm genes, expressed proteins thereof and peptide.

1 23 1 154 PRT Artificial Sequence Description of Artificial Sequence A POLYPEPTIDE COMMON TO GCM PROTEINS OF VARIOUS SPECIES 1 Trp Asp Ile Asn Asp Xaa Xaa Xaa Pro Xaa Xaa Xaa Xaa Xaa Xaa Asp 1 5 10 15 Xaa Phe Xaa Xaa Trp Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ile Tyr Ser Xaa 20 25 30 Xaa Xaa Xaa Xaa Ala Xaa Xaa His Xaa Ser Xaa Trp Ala Met Arg Asn 35 40 45 Thr Asn Asn His Asn Xaa Xaa Ile Leu Lys Lys Ser Cys Leu Gly Val 50 55 60 Xaa Xaa Cys Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Xaa 65 70 75 80 Leu Arg Pro Ala Ile Cys Asp Lys Ala Arg Xaa Lys Gln Gln Xaa Lys 85 90 95 Xaa Cys Pro Xaa Xaa Asn Cys Xaa Xaa Xaa Leu Xaa Xaa Xaa Xaa Cys 100 105 110 Arg Gly His Xaa Gly Xaa Pro Val Thr Xaa Phe Trp Arg Xaa Asp Gly 115 120 125 Xaa Xaa Ile Xaa Phe Gln Xaa Lys Gly Xaa His Asp Xaa Pro Xaa Pro 130 135 140 Glu Xaa Lys Xaa Xaa Xaa Glu Xaa Arg Arg 145 150 2 8 DNA Artificial Sequence Description of Artificial Sequence A DNA SEQUENCE TO WHICH A GCM PROTEIN COMPRISING THE AMINO ACID OF SEQ ID NO 1 BINDS 2 rcccgcat 8 3 20 DNA Artificial Sequence Description of Artificial Sequence SYNTHETIC DNA 3 tgtgtggaat tgtgagcgga 20 4 19 DNA Artificial Sequence Description of Artificial Sequence SYNTHETIC DNA 4 ggttttccca gtcacgacg 19 5 54 DNA Artificial Sequence Description of Artificial Sequence SYNTHETIC DNA 5 tgtgtggaat tgtgagcgga cctacccgca ttacgggttt tcccagtcac gacg 54 6 54 DNA Artificial Sequence Description of Artificial Sequence SYNTHETIC DNA 6 tgtgtggaat tgtgagcgga tatactaatt tgttaggttt tcccagtcac gacg 54 7 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 7 aaatcccgca taaa 14 8 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 8 catgcccgcg tcgc 14 9 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 9 atggcgcgca tcca 14 10 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 10 catacccgca gttt 14 11 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 11 cagacccaca taat 14 12 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 12 tttgcccgca tttc 14 13 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 13 ccggcccgca ttga 14 14 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 14 aatgcccaca tgat 14 15 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 15 agtaccagca tctg 14 16 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 16 accaccctca tgag 14 17 14 DNA Artificial Sequence Description of Artificial Sequence DNA FRAGMENT 17 cgcgccagca tgaa 14 18 436 PRT Homo sapiens 18 Met Glu Pro Asp Asp Ser Asp Ser Glu Asp Lys Glu Ile Leu Ser Trp 1 5 10 15 Asp Ile Asn Asp Val Lys Leu Pro Gln Asn Val Lys Lys Thr Asp Trp 20 25 30 Phe Gln Glu Trp Pro Asp Ser Tyr Ala Lys His Ile Tyr Ser Ser Glu 35 40 45 Asp Lys Asn Ala Gln Arg His Leu Ser Ser Trp Ala Met Arg Asn Thr 50 55 60 Asn Asn His Asn Ser Arg Ile Leu Lys Lys Ser Cys Leu Gly Val Val 65 70 75 80 Val Cys Gly Arg Asp Cys Leu Ala Glu Glu Gly Arg Lys Ile Tyr Leu 85 90 95 Arg Pro Ala Ile Cys Asp Lys Ala Arg Gln Lys Gln Gln Arg Lys Arg 100 105 110 Cys Pro Asn Cys Asp Gly Pro Leu Lys Leu Ile Pro Cys Arg Gly His 115 120 125 Gly Gly Phe Pro Val Thr Asn Phe Trp Arg His Asp Gly Arg Phe Ile 130 135 140 Phe Phe Gln Ser Lys Gly Glu His Asp His Pro Lys Pro Glu Thr Lys 145 150 155 160 Leu Glu Ala Glu Ala Arg Arg Ala Met Lys Lys Val Asn Thr Ala Pro 165 170 175 Ser Ser Val Ser Leu Ser Leu Lys Gly Ser Thr Glu Thr Arg Ser Leu 180 185 190 Pro Gly Glu Thr Gln Ser Gln Gly Ser Leu Pro Leu Thr Trp Ser Phe 195 200 205 Gln Glu Gly Val Gln Leu Pro Gly Ser Tyr Ser Gly His Leu Ile Ala 210 215 220 Asn Thr Pro Gln Gln Asn Ser Leu Asn Asp Cys Phe Ser Phe Ser Lys 225 230 235 240 Ser Tyr Gly Leu Gly Gly Ile Thr Asp Leu Thr Asp Gln Thr Ser Thr 245 250 255 Val Asp Pro Met Lys Leu Tyr Glu Lys Arg Lys Leu Ser Ser Ser Arg 260 265 270 Thr Tyr Ser Ser Gly Asp Leu Leu Pro Pro Ser Ala Ser Gly Val Tyr 275 280 285 Ser Asp His Gly Asp Leu Gln Ala Trp Ser Lys Asn Ala Ala Leu Gly 290 295 300 Arg Asn His Leu Ala Asp Asn Cys Tyr Ser Asn Tyr Pro Phe Pro Leu 305 310 315 320 Thr Ser Trp Pro Cys Ser Phe Ser Pro Ser Gln Asn Ser Ser Glu Pro 325 330 335 Phe Tyr Gln Gln Leu Pro Leu Glu Pro Pro Ala Ala Lys Thr Gly Cys 340 345 350 Pro Pro Leu Trp Pro Asn Pro Ala Gly Asn Leu Tyr Glu Glu Lys Val 355 360 365 His Val Asp Phe Asn Ser Tyr Val Gln Ser Pro Ala Tyr His Ser Pro 370 375 380 Gln Gly Asp Pro Phe Leu Phe Thr Tyr Ala Ser His Pro His Gln Gln 385 390 395 400 Tyr Ser Leu Pro Ser Lys Ser Ser Lys Trp Asp Phe Glu Glu Glu Met 405 410 415 Thr Tyr Leu Gly Leu Asp His Cys Asn Asn Asp Met Leu Leu Asn Leu 420 425 430 Cys Pro Leu Arg 435 19 436 PRT Mus musculus 19 Met Glu Leu Asp Asp Phe Asp Pro Glu Asp Lys Glu Ile Leu Ser Trp 1 5 10 15 Asp Ile Asn Asp Val Lys Leu Pro Gln Asn Val Lys Thr Thr Asp Trp 20 25 30 Phe Gln Glu Trp Pro Asp Ser Tyr Val Lys His Ile Tyr Ser Ser Asp 35 40 45 Asp Arg Asn Ala Gln Arg His Leu Ser Ser Trp Ala Met Arg Asn Thr 50 55 60 Asn Asn His Asn Ser Arg Ile Leu Lys Lys Ser Cys Leu Gly Val Val 65 70 75 80 Val Cys Ser Arg Asp Cys Ser Thr Glu Glu Gly Arg Lys Ile Tyr Leu 85 90 95 Arg Pro Ala Ile Cys Asp Lys Ala Arg Gln Lys Gln Gln Arg Lys Ser 100 105 110 Cys Pro Asn Cys Asn Gly Pro Leu Lys Leu Ile Pro Cys Arg Gly His 115 120 125 Gly Gly Phe Pro Val Thr Asn Phe Trp Arg His Asp Gly Arg Phe Ile 130 135 140 Phe Phe Gln Ser Lys Gly Glu His Asp His Pro Arg Pro Glu Thr Lys 145 150 155 160 Leu Glu Ala Glu Ala Arg Arg Ala Met Lys Lys Val His Met Ala Ser 165 170 175 Ala Ser Asn Ser Leu Arg Met Lys Gly Arg Pro Ala Ala Lys Ala Leu 180 185 190 Pro Ala Glu Ile Pro Ser Gln Gly Ser Leu Pro Leu Thr Trp Ser Phe 195 200 205 Gln Glu Ser Val Gln Leu Pro Gly Thr Tyr Ser Thr Pro Leu Ile Ala 210 215 220 Asn Ala Pro Gln Gln Lys Ser Leu Asn Asp Cys Leu Ser Phe Pro Lys 225 230 235 240 Asn Tyr Asp Leu Gly Gly Ser Thr Glu Leu Glu Asp Pro Thr Ser Thr 245 250 255 Leu Asp Ser Met Lys Phe Tyr Glu Arg Cys Lys Phe Ser Ser Ser Arg 260 265 270 Ile Tyr Gly Ser Glu Glu Gln Phe Gln Pro Pro Val Pro Gly Thr Tyr 275 280 285 Gly Asp Tyr Glu Asp Leu Gln Thr Trp Asn Lys Asn Val Ala Leu Gly 290 295 300 Arg Asn Pro Ser Asp Asp Ile Tyr Tyr Pro Ala Tyr Pro Leu Pro Val 305 310 315 320 Ala Ser Trp Pro Tyr Asp Tyr Phe Pro Ser Gln Asn Ser Leu Glu His 325 330 335 Leu Pro Gln Gln Val Pro Ser Glu Pro Pro Ala Ala Gln Pro Gly Cys 340 345 350 His Pro Leu Trp Ser Asn Pro Gly Gly Glu Pro Tyr Glu Glu Lys Val 355 360 365 Ser Val Asp Leu Ser Ser Tyr Val Pro Ser Leu Thr Tyr His Pro Pro 370 375 380 Gln Gln Asp Pro Phe Leu Leu Thr Tyr Gly Ser Pro Thr Gln Gln Gln 385 390 395 400 His Ala Leu Pro Gly Lys Ser Asn Arg Trp Asp Phe Glu Glu Glu Met 405 410 415 Ala Cys Met Gly Leu Asp His Phe Asn Asn Glu Met Leu Leu Asn Phe 420 425 430 Cys Ser Leu Arg 435 20 504 PRT Mus musculus 20 Met Pro Ala Asp Ser Thr Gln Asp Glu Asp Ala Val Leu Ser Tyr Gly 1 5 10 15 Met Lys Leu Thr Trp Asp Ile Asn Asp Pro Gln Met Pro Gln Glu Pro 20 25 30 Thr His Phe Asp His Phe Arg Glu Trp Pro Asp Gly Tyr Val Arg Phe 35 40 45 Ile Tyr Ser Ser Gln Glu Lys Lys Ala Gln Arg His Leu Ser Gly Trp 50 55 60 Ala Met Arg Asn Thr Asn Asn His Asn Gly His Ile Leu Lys Lys Ser 65 70 75 80 Cys Leu Gly Val Val Val Cys Ala Arg Ala Cys Ala Leu Lys Asp Gly 85 90 95 Ser His Leu Gln Leu Arg Pro Ala Ile Cys Asp Lys Ala Arg Leu Lys 100 105 110 Gln Gln Lys Lys Ala Cys Pro Asn Cys His Ser Pro Leu Glu Leu Val 115 120 125 Pro Cys Arg Gly His Ser Gly Tyr Pro Val Thr Asn Phe Trp Arg Leu 130 135 140 Asp Gly Asn Ala Ile Phe Phe Gln Ala Lys Gly Val His Asp Arg Pro 145 150 155 160 Arg Pro Glu Ser Lys Ser Glu Thr Glu Gly Arg Arg Ser Ala Leu Lys 165 170 175 Arg Gln Met Ala Ser Phe Tyr Gln Pro Gln Lys Arg Arg Ser Glu Glu 180 185 190 Pro Glu Ala Arg Ser Thr Gln Asp Ile Arg Gly His Leu Asn Ser Thr 195 200 205 Ala Ala Leu Glu Pro Thr Glu Leu Phe Asp Met Thr Ala Asp Thr Ser 210 215 220 Phe Pro Ile Pro Gly Gln Pro Ser Pro Ser Phe Pro Asn Ser Asp Val 225 230 235 240 His Arg Val Thr Cys Asp Leu Pro Thr Phe Gln Gly Asp Ile Ile Leu 245 250 255 Pro Phe Gln Lys Tyr Pro Asn Pro Ser Ile Tyr Phe Pro Gly Pro Pro 260 265 270 Trp Gly Tyr Glu Leu Ala Ser Ser Gly Val Thr Gly Ser Ser Pro Tyr 275 280 285 Ser Thr Leu Tyr Lys Asp Ser Ser Val Val Pro Asp Asp Pro Asp Trp 290 295 300 Ile Pro Leu Asn Ser Leu Gln Tyr Asn Val Ser Ser Tyr Gly Ser Tyr 305 310 315 320 Glu Arg Thr Leu Asp Phe Thr Ala Arg Tyr His Ser Trp Lys Pro Thr 325 330 335 His Gly Lys Pro Ser Leu Glu Glu Lys Val Asp Cys Glu Gln Cys Gln 340 345 350 Ala Val Pro Thr Ser Pro Tyr Tyr Asn Leu Glu Leu Pro Cys Arg Tyr 355 360 365 Leu Pro Val Pro Ala Ala Gly Thr Gln Ala Leu Gln Thr Val Ile Thr 370 375 380 Thr Thr Val Ala Tyr Gln Ala Tyr Gln His Pro Ala Leu Lys His Ser 385 390 395 400 Asp Ser Met Gln Glu Val Ser Ser Leu Ala Ser Cys Thr Tyr Ala Ser 405 410 415 Glu Asn Leu Pro Met Pro Ile Tyr Pro Pro Ala Leu Asp Pro Gln Glu 420 425 430 Gly Val Ile Arg Ala Ala Ser Pro Ser Gly Arg Ala Pro Leu Lys Val 435 440 445 Pro Gly Asp Cys Gln Ala Pro Arg Pro Thr Leu Asp Phe Pro Gln Glu 450 455 460 Ala Asp Pro Ser Gly Thr Asp Gly Ala Asp Val Trp Asp Val Cys Leu 465 470 475 480 Ser Gly Val Gly Ser Val Met Gly Tyr Leu Asp Arg Thr Gly Gln Pro 485 490 495 Phe Ser Phe Asp Asp Glu Asp Phe 500 21 504 PRT Drosophila 21 Met Val Leu Asn Gly Met Pro Ile Thr Met Pro Val Pro Met Pro Val 1 5 10 15 Pro Met Pro Val Pro Ser Pro Pro Ala Thr Lys Ser Arg Val Ala Ile 20 25 30 Asp Trp Asp Ile Asn Asp Ser Lys Met Pro Ser Val Gly Glu Phe Asp 35 40 45 Asp Phe Asn Asp Trp Ser Asn Gly His Cys Arg Leu Ile Tyr Ser Val 50 55 60 Gln Ser Asp Glu Ala Arg Lys His Ala Ser Gly Trp Ala Met Arg Asn 65 70 75 80 Thr Asn Asn His Asn Val Asn Ile Leu Lys Lys Ser Cys Leu Gly Val 85 90 95 Leu Leu Cys Ser Ala Lys Cys Lys Leu Pro Asn Gly Ala Ser Val His 100 105 110 Leu Arg Pro Ala Ile Cys Asp Lys Ala Arg Arg Lys Gln Gln Gly Lys 115 120 125 Cys Cys Pro Asn Arg Asn Cys Asn Gly Arg Leu Glu Ile Gln Ala Cys 130 135 140 Arg Gly His Cys Gly Tyr Pro Val Thr His Phe Trp Arg Arg Asp Gly 145 150 155 160 Asn Gly Ile Tyr Phe Gln Ala Lys Gly Thr His Asp His Pro Arg Pro 165 170 175 Glu Ala Lys Gly Ser Thr Glu Ala Arg Arg Leu Leu Ala Gly Gly Arg 180 185 190 Arg Val Arg Ser Leu Ala Val Met Leu Ala Arg Glu Ser Ala Leu Ser 195 200 205 Asp Lys Leu Ser Ser Leu Arg Pro Thr Lys Arg Gln Ala Lys Thr Gln 210 215 220 Ser Ile Gln Glu Ser Lys Arg Arg Arg Met Gly Ala Ser Asp Val Leu 225 230 235 240 Glu Thr Lys Gln Glu Leu Val Val Pro Pro Thr Thr Tyr Leu Pro Thr 245 250 255 Ser Thr Pro Thr His Ser Thr Asn Phe Asn Gln Ser Gln Gly Ser Tyr 260 265 270 Val Pro Ala Gly Gln Gly Ser Val Ile Ser Gln Trp Asn Arg Glu Ile 275 280 285 His Tyr Glu Thr Glu Asp Pro Cys Tyr Ala Asn Gly Met Tyr Ser Tyr 290 295 300 Asp Met Leu His Ser Pro Leu Ser Ala His Ser Ser Thr Gly Ser Tyr 305 310 315 320 Tyr Gln Glu Asn Lys Pro Gln Gln Leu Gln His Ser Gln Tyr Gln Gln 325 330 335 Gln Leu Ser Pro Gln Gln His Val Pro Val Ser Tyr Asp Pro Ser Gln 340 345 350 Pro Ile Ser Ser Ser Leu Gln Cys Gly Met Pro Ser Tyr Glu Ile Cys 355 360 365 Asp Asp Thr Ser Ser Leu Thr Ser Ser Ser Gly Tyr Cys Ser Glu Asp 370 375 380 Tyr Gly Tyr Tyr Asn Gly Tyr Leu Pro Asn Ser Leu Asp Val Ser Asn 385 390 395 400 Gly Ser Gln Ser Gln Asn Leu Ser Gln Asp Ala Ser Ile Tyr Thr Thr 405 410 415 Ser Ser Glu Ile Phe Ser Val Phe Glu Ser Thr Leu Asn Gly Gly Gly 420 425 430 Thr Ser Gly Val Asp Leu Ile Tyr Asp Glu Ala Thr Ala Tyr Gln Gln 435 440 445 His Gln Gln Gln Gly Thr Phe Pro His Leu Thr Asn Tyr Gln Gln Glu 450 455 460 Pro Gln Asp Gln Met Gln Ser Ala Asp Tyr Tyr Tyr Ser Asn Thr Gly 465 470 475 480 Val Asp Ser Asn Trp Asn Ile Gln Met Asp Ala Thr Tyr His Pro Val 485 490 495 Asn Ser Thr Asp Pro Ile Tyr Cys 500 22 35 PRT Mus musculus 22 Ile Leu Lys Lys Ser Cys Leu Gly Val Val Val Cys Gly Arg Asp Cys 1 5 10 15 Leu Ala Glu Glu Gly Arg Lys Ile Tyr Leu Arg Pro Ala Ile Cys Asp 20 25 30 Lys Ala Arg 35 23 47 PRT Drosophila 23 Trp Ala Met Arg Asn Thr Asn Asn His Asn Val Asn Ile Leu Lys Lys 1 5 10 15 Ser Cys Leu Gly Val Leu Val Cys Ser Gln His Cys Thr Leu Pro Asn 20 25 30 Gly Ser Lys Ile Asn Leu Arg Pro Ala Ile Cys Asp Lys Ala Arg 35 40 45 

What is claimed is:
 1. A polypeptide existing in common to proteins expressed from the glial cells missing (gcm) gene which is a fate determination gene of cells in animal central nervous system, which polypeptide has the amino acid sequence of Sequence ID No. 1 or a part thereof.
 2. The polypeptide of claim 1, of which one or more amino acid residues are substituted, deleted or added.
 3. A DNA sequence on animal genomic DNA to which a protein containing the polypeptide of claim 1 or 2 binds, which DNA sequence has the nucleotide sequence of Sequence ID No. 2 or a homologous sequence thereto.
 4. A vector holding the DNA sequence of claim
 3. 